AITopics | behavioral reference policy

eecca5b6365d9607ee5a9d336962c534-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 19:37:31 GMT

behavioral policy, behavioral reference policy, predictive variance, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Setting > Online (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.94)

Add feedback

On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

Neural Information Processing SystemsDec-25-2025, 05:00:45 GMT

KL-regularized reinforcement learning from expert demonstrations has proved successful in improving the sample efficiency of deep reinforcement learning algorithms, allowing them to be applied to challenging physical real-world tasks. However, we show that KL-regularized reinforcement learning with behavioral reference policies derived from expert demonstrations can suffer from pathological training dynamics that can lead to slow, unstable, and suboptimal online learning. We show empirically that the pathology occurs for commonly chosen behavioral policy classes and demonstrate its impact on sample efficiency and online policy performance. Finally, we show that the pathology can be remedied by non-parametric behavioral reference policies and that this allows KL-regularized reinforcement learning to significantly outperform state-of-the-art approaches on a variety of challenging locomotion and dexterous hand manipulation tasks.

expert demonstration, kl-regularized reinforcement learning, name change, (5 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Diagnostic Medicine (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Supplementary Material T able of Contents

Neural Information Processing SystemsAug-18-2025, 16:49:48 GMT

A Laplace behavioral reference policy may be able to mitigate some of the problems posed by Proposition 1 due to the heavy tails of the distribution. Tikhonov regularization does not resolve the issue with calibration of uncertainties. A W AC performs online fine-tuning of a policy pre-trained on offline. BRAC regularizes the online policy against an offline behavioral policy as our method does. DAPG incorporates offline data into policy gradients by initially pre-training with a behaviorally cloned policy and then augmenting the RL loss with a supervised-learning loss.

artificial intelligence, behavioral policy, machine learning, (16 more...)

Neural Information Processing Systems

Industry: Education > Educational Setting > Online (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)

Add feedback

eecca5b6365d9607ee5a9d336962c534-Paper.pdf

Neural Information Processing SystemsAug-18-2025, 16:49:45 GMT

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Setting > Online (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.94)

Add feedback

On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

Neural Information Processing SystemsMay-27-2025, 06:27:42 GMT

KL-regularized reinforcement learning from expert demonstrations has proved successful in improving the sample efficiency of deep reinforcement learning algorithms, allowing them to be applied to challenging physical real-world tasks. However, we show that KL-regularized reinforcement learning with behavioral reference policies derived from expert demonstrations can suffer from pathological training dynamics that can lead to slow, unstable, and suboptimal online learning. We show empirically that the pathology occurs for commonly chosen behavioral policy classes and demonstrate its impact on sample efficiency and online policy performance. Finally, we show that the pathology can be remedied by non-parametric behavioral reference policies and that this allows KL-regularized reinforcement learning to significantly outperform state-of-the-art approaches on a variety of challenging locomotion and dexterous hand manipulation tasks.

expert demonstration, kl-regularized reinforcement learning, pathology, (2 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Diagnostic Medicine (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

Neural Information Processing SystemsJan-19-2025, 12:39:42 GMT

KL-regularized reinforcement learning from expert demonstrations has proved successful in improving the sample efficiency of deep reinforcement learning algorithms, allowing them to be applied to challenging physical real-world tasks. However, we show that KL-regularized reinforcement learning with behavioral reference policies derived from expert demonstrations can suffer from pathological training dynamics that can lead to slow, unstable, and suboptimal online learning. We show empirically that the pathology occurs for commonly chosen behavioral policy classes and demonstrate its impact on sample efficiency and online policy performance. Finally, we show that the pathology can be remedied by non-parametric behavioral reference policies and that this allows KL-regularized reinforcement learning to significantly outperform state-of-the-art approaches on a variety of challenging locomotion and dexterous hand manipulation tasks.

expert demonstration, kl-regularized reinforcement learning, pathology, (2 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Diagnostic Medicine (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Collaborating Authors

behavioral reference policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

eecca5b6365d9607ee5a9d336962c534-Paper.pdf

On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

Supplementary Material T able of Contents

eecca5b6365d9607ee5a9d336962c534-Paper.pdf

On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations